Skip to content

Create evals for diverse user inputs#2

Merged
stephenc222 merged 9 commits into
mainfrom
cursor/create-evals-for-diverse-user-inputs-d372
Jul 7, 2025
Merged

Create evals for diverse user inputs#2
stephenc222 merged 9 commits into
mainfrom
cursor/create-evals-for-diverse-user-inputs-d372

Conversation

@stephenc222

Copy link
Copy Markdown
Collaborator

Implement a basic node-level evaluation harness to test individual intent graph nodes in isolation.

@stephenc222 stephenc222 force-pushed the cursor/create-evals-for-diverse-user-inputs-d372 branch 2 times, most recently from 56f29e0 to 8f8679a Compare July 7, 2025 15:12
@stephenc222 stephenc222 force-pushed the cursor/create-evals-for-diverse-user-inputs-d372 branch from 8f8679a to 8b0b326 Compare July 7, 2025 15:12
@stephenc222 stephenc222 merged commit 55dcc9b into main Jul 7, 2025
1 check passed
@stephenc222 stephenc222 deleted the cursor/create-evals-for-diverse-user-inputs-d372 branch July 7, 2025 15:14
stephenc222 added a commit that referenced this pull request Jul 13, 2025
* Checkpoint before follow-up message

* Add dependencies and generate evaluation dataset and outputs

Co-authored-by: stephenc211 <stephenc211@gmail.com>

* Checkpoint before follow-up message

* Checkpoint before follow-up message

* Add node evaluation framework with sample NLU and slot filling nodes

Co-authored-by: stephenc211 <stephenc211@gmail.com>

* Implement eval framework for intent-kit with new datasets, sample nodes, reporting, updated config, and tests. Remove old files.

* update github CI/CD pipeline

* cleanup, use mock mode on public CI

---------

Co-authored-by: Cursor Agent <cursoragent@cursor.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants